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3. Mean, Median, Variance and Standard Deviation 

Mean 

In the last section we saw that if saving and loan institutions are continuously failing at a rate of 5% per 
year, then the associated probability density function is 

f(x) = 0.05e"° 05x , 

with domain [0, +«). An interesting and important question to ask is: What is the average length of time 
such an institution will last before failing? To answer this question, we use the following. 



Mean or Expected Value 

If X is a continuous random variable with probability density function f defined on an interval with 
(possibly infinite) endpoints a and b, then the mean or expected value of X is 

b 

E(X) = f xf(x)dx. 
a 

E(X) is also called the average value of X. It is what we expect to get if we take the average of many 
values of X obtained in experiments. 



— Example 1 

Let X have probability density function given by 
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f(x) = 3x 2 , 
with domain [0, 1]. Find E(X). 
Solution 
We have 



E(X) = f xf(x)dx. 
a 

1 

= f (x)(3x 2 )dx 



= f (3x 3 )dx 

Thus, the expected value of X is 3/4. 
Before We Go On ... 

This reflects the fact that X is more likely to take on values in the right part of the interval [0,1] 
than the left part. The figure shows this probability density function. 




0.2 0.4 0.6 0.8 1 



We shall explain shortly why E(X) is given by the integral formula. 



_= Example 2 Failing S&Ls 

Given that troubled S&Ls are failing continuously at a rate of 5% per year, how long will the 
average troubled S&L last? 

Solution 

If X is the number of years that a given S&L will last, we know that its probability density 



function is f(x) = 0.05e 



-0.05x 



To answer the question we compute E(X). 
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b 

E(X)=f xf(x)dx 



rage jOm 



a 
+00 

=f (0.05xe~° 05x ) dx 
J 0 



M 

fim J (0.05xe-° 05x ) dx 



M— 0 
Using integration by parts, we get 

E (X) = um -0.05 [ e-° 05x (20x + 400)] 0 M = (0.05)(400) = 20. 

Thus, the expected lifespan of a troubled S&L is 20 years. 
Before We Go On ... 

Notice that the answer, 20, is the reciprocal of the failure rate 0.05. This is true in general: if f(x) = 
ae~ ax , thenE(X) = l/a. 



Question 

Why is E(X) given by that integral formula? 
Answer 

Suppose for simplicity that the domain of f is a finite interval [a, b]. Break up the interval into n 
subintervals [x k _ lf x k ], each of length ax, as we did for Riemann sums. Now, the probability of seeing a 
value of X in [x^, x k ] is approximately f(x k ) ax (the approximate area under the graph of f over [x k _ 1} 
x k ]). Think of this as the fraction of times we expect to see values of X in this range. These values, all 
close to x k , then contribute approximately x k f(x k ) ax to the average, if we average together many 
observations of X. Adding together all of these contributions, we get 

E(X)*2 k x k f(x k ) ax 

Now these approximations get better as n^~, and we notice that the sum above is a Riemann sum 
converging to 

b 

E(X) = J xf(x)dx, 
a 

which is the formula we have been using. 
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Question 

What are the e 



xpected values of the standard distributions we discussed in the previous section? 



Answer 

Let's compute them one by one. 



Mean of a Uniform Distribution 

If X is uniformly distributed on [a, b], then 
F/XWa + b)/2. 



This is not surprising; 
exercises. 



if you think about it for a minute. We'll leave the actual computation as one of the 



Mean of an Exponential Distribution 

If X has the exponential distribution function f(x) = ae" 1 *, then 
Em = 1/a. 



We saw how to compute this in Example 2. 



Mean of an Normal Distribution 

If X is normally distributed with parameters u and a , then 



This is why we 
see it. M 



called u the mean, but we ought to do the calculation. Click on the footnote marker to to 



Mean of a Beta Distribution 



0 

If X has the beta distribution function f(x) = ( fi +l)(*+2)x (1 -x), then 
E(X) = 0j + !)/(* +3) 



Again, we shall leave this as an exercise. 
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— Example 3 Downsizing in the Utilities Industry 

A utilities industry consultant predicts a cutback in the Canadian Utilities industry during 2000- 
2005 by a percentage specified by a beta distribution with 0 = 0.25. What is the expected size of 

the cutback by Ontario Hydro? US 

Solution 

Since p = 0.25, 

E(X) = ( fi + \)Ke + 3) = 1.25/3.25 * 0.38. 
Therefore, we can expect about a 38% cutback by Ontario Hydro. 
Before We Go On ... 

What E(X) really tells us is that the average downsizing of many utilities will be 38%. Some will 
cut back more, and some will cut back less. 

There is a generalization of the mean that we shall use below. If X is a random variable on the interval 
(a, b) and probability density function f, and if g is any function defined on that interval, then we can 
define the expected value of g to be 

b 

E(g(X)) = { g(x)f(x)dx. 



Thus, in particular, the mean is just the expected value of the function g(x) = x. We can interpret this 
the average we expect if we compute g(X) for many experimental values of X. 

Variance and Standard Deviation 

Statisticians use the variance and standard deviation of a continuous random variable X as a way of 
measuring its dispersion, or the degree to which is it "scattered." The definitions are as follows. 



as 



Variance and Standard Deviation 

Let X be a continuous random variable with density function f defined on the interval (a, b), and let u 
= E(X) be the mean of X. Then the variance of X is given by 

b 

Var(X) = E((X-u) 2 ) = J (x-u) 2 f(x) dx. 

a 

The standard deviation of X is the square root of the variance, 



,(X) = (Var(X)) 0 - 5 
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Notes 



(1) 



In order to calculate the variance and standard deviation, we need first to calculate the mean. 



(2) VarfX) is the expected value of the function (x-u) 2 , which measures the square of the distance of 
X from its mean. It is for this reason that Var(X) is sometimes called the mean square deviation, and a 
(X) is called the root mean square deviation. Var(X) will be larger if X tends to wander far away from 
its mean, and smaller if the values of X tend to cluster near its mean. 

(3) The reason we take the square root in the definition of * (X) is that Var(X) is the expected value of 
the square of the deviation from the mean, and thus is measured in square units. Its square root <,(X) 
therefore gives us a measure in ordinary units. 



Question 

What are the variances and standard deviations of the standard distributions we discussed in the previous 
section? 

Answer 

Let's compute them one by one. We'll leave the actual computations (or special cases) for the exercises. 



Variance and Standard Deviation of a Uniform Distribution 

If X is uniformly distributed on [a,b], then 

Var(X) = (b-a) 2 /12 
and 



<,(X)=fl>-a)/12' 



0.5 



Variance and Standard Deviation of an Exponential Distribution 

If X has the exponential distribution function f(x) = ae~ ax , then 
5 .- Var(X) = l/a 2 
and 

q(X) = l/a. ___===== 



Variance and Standard Deviation of a Normal Distribution 

If X is normally distributed with parameters u and <7 , then 



Var(X)= C7 2 
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and 



(Th: 



a(X)=a. 

is is what you might have expected!) 



Variance and Standard Deviation of a Beta Distribution 
If X has the beta distribution function f(x) = (p+\)(f} + 2)x (!"*)> then 



Var<X> = 0St4)0Jt3)2 



■and 



■4 



2(ff+ 1) 
&S+4)0?+3) 2 



You can see the significance of the standard deviation quite clearly in the normal J^J^ 
menttod in the prions section, „ is the distance from the maximum at n to the l^^^f^^ 
™Tmd» +o .iL larger <, is, the wider the bell. The following shows three normal distnbutions with 
three different standard deviations (all with \i = 0.5). 




€7 = 0.1 



€7=05 



Again a small standard deviation means that the values of X will be close to the mean with high 
probability, while a large standard deviation means that the values may wander far away with high 
probability. 

Median 

The median income in the U.S. is the income M such that half the population earn incomes * M (so the 
other half earn incomes ' M). In terms of probability, we can think of income as a random variable X. 
Then the probability that X * M is % and the probability that X * M is also 12. 



Median 

Let X be a continuous random variable. The median of X is the 
number M such that 

P(X*M) = l/2. 

Then P(M *X) = 1/2 also. 
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If f is the probability density function for X and f is defined on (a, b), then we can calculate M by 
solving the equation 



M 



P(a*X*M) = f f(x)dx=l/2. 
a 

for M. Graphically, the vertical line x = M divides the total area under the graph of f into two equal 
parts. (See the figure). 




Question 

What is the difference between the median and the mean? 
Answer 

Roughly speaking, the median divides the area under the distribution curve into two equal parts while 
the mean is the value of X at which the graph would balance. If a probability curve has as much area to 
the left of the mean as to the right, then the mean is equal to the median. This is true of uniform and 
normal distributions, which are symmetric about their means. On the other hand, the medians and means 
are different for the exponential distributions and most of the beta distributions, because their areas are 
not distributed symmetrically. 



; Example 4 Lines at the Post Office 

The time in minutes between individuals joining the line at an Ottawa Post Office is a random 
variable with the exponential distribution 

f(x) = 2e" 2x ,(x*0). 

Find the mean and median time between individuals joining the line and interpret the answers. 
Solution 

The expected value for an exponential distribution f(x) = ae -3 * is 1/a. Here, a = 2, so E(X) = 1/2. 
We interpret this to mean that, on average, a new person will join the line every half a minute, or 
30 seconds. For the median, we must solve 
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M 

f f(x)dx=l/2. 
J 0 



That is, 



M 



f (2e" 2x )dx=l/2. 

Evaluating the integral gives 
-[e- 2x ] 0 M =l/2, 



or 



I -e _2M =l/2 



so 



-2M_ 



= 1/2, 



or 



-2M = ln(l/2)=-ln2. 



Thus, 

M = (In 2)/2* 0.3466 minutes. 

This means that half the people get in line less than 0.3466 minutes (about 21 seconds) after the 
previous person, while half arrive more than 0.3466 minutes later. The mean time for a new 
person to arrive in line is larger than this because there are some occasional long waits between 
people, and these pull the average up. 



M 

Sometimes we cannot solve the equation^ f(x) dx = 1/2 for M analytically, as the next example 
shows. 



; Example 5 

Find the median of the random variable with beta density function for fi = 4. 
Solution 
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Here, 

f(x) = (^+l)(^+2)x^(l-x) 
= 30x 4 (l-x). 

Thus we must solve 
M 

f (30x 4 (l-x))dx=l/2. 
J o 

That is, 

M 

30 f (x 4 -x 5 )dx=l/2. 
J 0 

So 

30[M 5 /5 -M 6 /6] = l/2 
or, multiplying through and clearing denominators, 

12M 5 - 10M 6 -1=0. 

This is a degree six polynomial equation that has no easy factorization. Since there is no general 
analytical method for obtaining the solution, the only method we can use is numerical. The figure 
shows three successive views of a graphing calculator plot of Y = 12X*5 - 10X*6 - 1, obtained by 
zooming in towards one of the zeros. 



s- 

H 
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j .- We are interested only in the zero that occurs between 0 and 1 (why?), and find that M * 0.735 to 
within ±0.001. 

Before We Go On ... 

Question 

M 

This method required us first to calculate f a f(x) dx analytically. What if even this is impossible 
to do? 
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Answer 

M 

We can solve the equation J a f(x) dx = 1/2 graphically by having the calculator compute and 
graph this function of M by numerical integration. For example, to redo the above example on the 
TI-83 or compatible models, enter 

Y, = fhInt(30T A 4(l-T),TAX)-0.5 

which corresponds to 
x 

y = J 0 (30t 4 (l - 1) dt - 1/2, 

a function of x. Since the median of M is the solution obtained by setting y = 0, we can obtain the 
answer by plotting Yj and finding its x-intercept. The plot should be identical to the one we 

obtained above (why?). 
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We would welcome comments and suggestions for improving this resource. 

Mail us at: 

H Stefan Waner (matszw@hofstra.edu) M Steven R. Costenoble (matsrc@hofstra.edu) 
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